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Abstract — This research is focused on the consistent, robust 
least squares dummy variable (LSDV R ) estimator which is 
predicated on the correction of the bias of the inconsistency of 
the least squares dummy variable estimator of the parameters of 
the dynamic panel data model, as an extension of earlier results. 
We compared the results of the bias corrected least squares 
dummy variable estimator of the dynamic panel data models in 
the presence of outliers, at stated specifications of the model with 
the consistent instrumental variable (IV) and the generalized 
method of moments (GMM) estimators of Anderson and Hsiao 
(AH), Arellano and Bond (AB) and Blundell and Bond (BB) to 
validate the claims or otherwise of the estimators. We observe at 
p= y =0.8 and B=0.2 that the robust least squares dummy 

variable estimator (LSDV R ) performs better than the IV- GMM 
in finite and large samples in terms of predictive powers and in 
the estimation of the autoregressive coefficient in large samples 
followed by the LSDV, though, with maximum RMSE property 
while the Blundell and Bond (BB) performs better than the 
other contending models in estimation of the autoregressive 
coefficient in finite samples showing that the presence of an 
outlier does not affect the predictive power of the robust least 
squares dummy variable (LSDV R ) estimator. 

Index Terms — Consistent estimator, dynamic, outliers, panel 
data model 

I. INTRODUCTION 

Leveraging on the exposition by [1] that the Least Squares 
dummy variable estimator is inconsistent in determining the 
estimates of the parameters of a first order autoregressive 
panel data model at finite time period, T, as the cross sectional 
units, N, becomes infinitely large, certain instrumental 
variable (IV) and generalized method of moments (GMM) 
estimators have been proposed in the econometric literature in 
the accounts of [2], [3] and [4]. 

However, the IV-GMM estimators which includes the 
Anderson Hsiao (AH) instrumental variable estimators, 
Arellano and Bond (AB) generalized method of moments 
estimators and Blundell and Bond (BB) system generalized 
method of moments estimator Could not provide all the cure 
for all the problems inherent in the model as a result of the 
violation of assumption of absence of correlation between the 
explanatory variable and the error term: a condition upon 
which the ordinary least squares and hence the least squares 
dummy variable estimator could be both consistent and 
efficient. In the accounts of [5], [6], [7], [8] and [9], the 
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system GMM was proposed as a result of the weakness of the 
first-differenced instrumental variable IV-GMM estimators 
which suffered small sample bias due to weak instruments. In 
clear terms, all the IV-GMM estimators maintained their 
consistency property, with highly persistent data, when the 
cross section units, N, is large but could be severely biased in 
small samples as in [5]. In large samples, the LSDV estimator, 
though inconsistent, has a small variance, relatively compared 
to the IV-GMM methods as observed by [1], [10] and [11]. 
The fact that the LSDV may be consistent in large samples in 
the direction of T is buttressed by [5] and [7], but has higher 
variance relative to the IV-GMM estimators in small samples 
with highly persistent data says [12]. 

Also , in highly persistent data, the Bias corrected least 
squares dummy variable (LSDV R ) estimation of a first order 
autoregressive Panel data model which involves the 
approximation of the bias of the least squares dummy variable 
estimator and taking care of the bias to produce an estimator 
that could be consistent both in large and small samples 
emerged in the accounts of[l],[13] and [7]. 

[1] and [13] used higher order asymptotic expansion 
approximation techniques of order N^T' 1 and N _1 T' 2 
respectively to obtain the small sample bias of the LSDV 
under the assumption of homoscedasticity. [14] obtained the 
bias corrected LSDV estimator for a case of cross section 
units heteroscadasticity. [7] obtained the bias corrected least 
squares dummy variable estimator for samples under the 
assumption of homoscedasticity and also worked on the bias 
correction model of the LSDV in the case of time series and 
cross section heteroscedasticity, an extension of the work by 
[14]. 

This paper is a further extension of the work of [14] and [7] 
and deals strictly with the comparison of the performances of 
the LSDV,LSDV R AH, AB and BB estimators in the presence 
of an outlier. An effort, still, in search of supportive evidence 
of their performances in the first order autoregressive panel 
data model that is still evolving. 

A. Weak Instrument 

An instrumental variable is a proxy which is highly 
correlated with the included endogenous variable in the 
dynamic panel data model but uncorrelated with the error 
term. The strength of the correlation can be determined using 
the F-statistics since the instrument and instrumented are 
observable. 

In the first order autoregressive panel data model given by 

y« =yyn-i +x itP + Vn 

where V it = a i +e il 
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test to determine the absence of correlation between the 
instrument and the error term is conducted using the 

Sagan-Hansen test statistic calculated as TR 2 . 7? 2 is the 
coefficient of multiple determination obtained from the OLS 
residuals onto the exogenous variable and T is period as in 
[4]. In a system of linear models, the test is not feasible in 
exactly identified model but rather in over identified models 
where it is expected that the instruments are truly exogenous 
and uncorrelated with the error term says [15]. The presence 
of instruments that are correlated with the error term or that 
are poorly correlated to the endogenous explanatory variable 
can make the estimates obtained to be inconsistent and are 
thus regarded as weak instruments. A weak instrument 
produced wrong estimates of parameters and standard errors 
while the good instrument is expected to be highly correlated 
with the instrumented and uncorrelated with the error to 
produce correct estimates of the parameters and their standard 
errors. 

B. Generalized Method of Moments ( GMM) Estimator 

Generalized method of moments (GMM) estimation is the 
application of Instrumental variable to an over-identified 
model, i.e. when the number of instruments is greater than or 
equal to the number of covariates in the equation of interest. It 
should be recalled that if instruments are greater than the 
number of covariates, this is over identification. In other 
words, the GMM is a generalization of the just-identified 
instrumental variable estimator. 

The danger of instrumental variable method is that it may be 
difficult to find a good instrument but may introduce 
multicollinearity. Hence [4] and [5] suggested the use of 
maximum likelihood estimation method with it limitations: 
methodologically and practical wise, especially in data 
involving large cross sections. 

C. Heteroscedasticity 

Heteroscedasticity is the presence of unequal variance of the 
error term in a model. Unequal error variance is a violation of 
the assumption of equal error variance (homoscedasticity) 
upon which the OLS is BLUE and efficient which is worth 
correcting only when it is severe as in [4]. However, in the 
presence of heteroscedasticity OLS is BLUE but not efficient. 
Under heteroscedasticity the estimates of the coefficients 
using OLS is unbiased but their standard errors may be biased 
in the accounts of [4] and [16]. 

D. Outlier 

An outlier is an observation, which is much different in 
magnitude, i.e. either very large or very small compared to 
other observation within the same sample. In other words, 
they are perceived to be from a population other than that 
from which the other sample observations are generated as in 
[4] and [15]. Outliers could be a source of heteroscedasticity. 
Outliers could be a result of the unobservable individual 
effects in a panel data study, such as effects of government 
policies, available resources and their uses, political will of 
the leader, level of patriotism of the citizenry, and generally, 
the individual attributes of the units of a cross-section. 


II. Approximating the inconsistency of the lsdv 
y,t - Wit-i + X lt P + a t + £ it , 1=1,2,. ,.,N;t=l,2,...,T 

(1) 

where y. f is the value of y for the ith individual or group at 
period, t, a TX1 vector of dependent variable. y it _i is the 
immediate value of y at the immediate past one period t-1 for 
the ith cross section unit or group. X {t is the value of the 
exogenous explanatory variable at period, t for the ith cross 
section unit or group and ((N-l) XI) vector . OL [ is the 

unobserved ith unit or group effect term and S it is an error 

2 

term that has mean zero and variance G e . 

When we consider the LSDV by within estimation obtained 
by the application of the ordinary least squares on the 
transformed model: 

y = yy~\ + xp + s ( 2 ) 

such that: H = ( w w) _1 W y (3) 

and: W = [y_ l ,X] (4) 

y_ x and X are observations that have been centered and 

stacked over time such that y_ x is an (NTX1) vector of 

lagged endogenous explanatory variable and X is an 
(NTX1) vector of strictly exogenous explanatory variables in 
the accounts of [7]. 

H = (y,j3) (5) 

H is an ((N+1)X1) vector of coefficients as in [5]. 

The inconsistency of the LSDV at finite period, T and large 
cross section units, N, is evidenced by 

C0V K %- 1 - v,,-i )<A ~e)\*o (6) 

and can be estimated using partition regression technique in 
line with [7], for the errors of y and (3 as 

y-y = (y. 1 Dy_ 1 y 1 y_ 1 £ (7) 

p-p = -{XX)- l Xy_ l {y-y) + {Xpr l X£ 

( 8 ) 

where: £ = y — wH (9) 

and D = l-X(XXy 1 X . 

Then taking probability limits, we have: 

= (Plm^^YDyYiPlm^^y'Ds) 

(10) 

P li'TW 0-.fi) = -P Hnw (fi) 4 X y_,P lim^ (f - y) 

(ID 

From (10): Plim^^ f y\Ds = / > lim A ,_ KO f y\s 

(12) 

because X is assumed to be strictly exogenous. 

Then from (12) 

P lim iV _ >QO — y_ x s 

ty iv— »oo 

(13) 
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By further decomposition, we obtain 

E(yJ) = tr(X\ r X r ) 


(14) 


Substituting (14) into (13), we obtain (15) in 
accordance with [7] and [14]. 

11 in 

P lim^ — y_ x s = Lim — V tr{Yl T lL T ) = tr(U T Lim — Yl .) = tr( U T 1 T ) 

N a^— >oo ^y a^— >co N ' 

(15) 

Under the assumption of homoscedasticity for which 
= <J 2 1 r , we have that 


Pli'TW f = fKn r a 2 / r ) = a-,r(U T ) = ' 7 - ) 

(16) 

which would result to the bias approximations below with 
reasonable level of accuracy: 


B, = c 1 (T 1 );B 2 = B l +c 2 (N 1 T 1 );B 3 = B 2 +c 3 (N l T~ 2 ) 

(17) 

where C { , C 2 and c 3 depended on the unknown parameters 
CF 2 and y. 

Substituting (15) into (10) we obtain: 


p lim^oo (r~r) = tr(n T l. T ) / CT 2 


y- iix 


Also, substituting (14) into (9), we obtain 

P linw (P-P) = -yP lim^ (y - y) 


(18) 


(19) 


(18) and (19) are the bias approximation of the LSDV 
estimator derived by [7] directly from the data without initial 
resort to any consistent estimator 

V = Plim — x'y 

^ n^N 

Z ja =Plim 3- XX and < /x =(l -<,)<• 

N — ^oo TV 

ply i = i ZjJ T Z^ j is the squared multilple correlation 

coefficient of y_ x regressed on X and y/ — Zj Z^ x is the 
corresponding vector of regression coefficients for y and p 
unknown. 


III. ROBUST LEAST SQUARES DUMMY VARIABLE (LSDV r ) 
ESTIMATOR 

The robust least squares dummy variable estimation is 
predicated on the derivation of an approximate expression for 
the inconsistency of the LSDV which could be used to correct 
for the bias of the LSDV. In [5], the robust LSDV estimator is 

2 

implemented by finding consistent estimates for <J £ and y , 
subtracting each or any of the bias approximation B c in (17) 
from the LSDV obtained by within estimation, we obtain the 
robust LSDV, LSDV R , estimator below: 


LSDV r = LSDV -B c , c= 1, 2 and 3: AH=1, AB=2 

and BB=3 (20) 

2 

The consistent estimator of G £ is obtained by 


e c De c 

N-K-T ’ 


N >T + K 


( 21 ) 

where e c = y~wH c and C= AH, AB and BB, are the 


consistent estimators of y . 

It is pertinent to point out, at this juncture, that the bias 
approximation derived by [1], [2] and [7] assumed 
homoscedasticity of the error variance in their studies. The 
additive bias corrected LSDV estimator by [7] , just like that 
of [2], relied upon the consistent IV-GMM estimators of y to 
determine the bias such that the bias corrected LSDV 
estimators are: 


r = Ydv 


tr(W 


T.Gmm ^ T.gmrn > 


G 


2 

y-i/x 


( 22 ) 


P = Pov+C 


tr([{ 


T.Gmm T.Gmm ' 


G 


2 

y- i/x 


(23) 


where Yl T GMM and Z rGMM are the variance structures 
which depended on y and /? respectively, and can be 
obtained by their sample equivalence as explained earlier in a 
section above while Z r can be estimated from: 


z 

For which 

(y, 


T.GMM 


= diag(a 2 ) 


(24) 


t.gmrn 


YgJt,- 1 ~ yPsnmLl ~ KnJtA ~ V AJ 

N(T-l)/T 


(25) 

and y t , y t _ { and X t are stacked across individuals such that 


x =(Svy 2 ( .-7J 

According to [7] , to take care of a non linear bias 
correction, the bias corrected estimator is not feasible for y 
is obtained by solving 


r = r- 


tr(Xlr Z r ) 

< ux 


for y while assuming, 


first, that the variance structure Z r is given by G 2 y | and £ . 
Z r , 0 2 y x and £ are all unknown and the consistent 
estimator of Z r [7]. 

By a system of K equations, the bias corrected y and /? are 
obtained from 


^ A /V ^ 

Ydv = Y~ tr(J\ T ’^ T )o'y_ i /x 

(26) 

Pdv =P + £tr({l r ± T )* 2 y _ i/x 

(27) 

where 


E r = diag(6-f ) ; 
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' N(T-1)IT 

IV. DESIGN OF THE EXPERIMENT 
To provide for effective comparison of the performance of the 
robust LSDV estimator, ( LSDV R ) against the AH, AB, and 

BB, we generated the y t value as it is defined in (1), i.e. Y it = 
X ! it p + y Yit-i + Oi + s it , a simple dynamic panel data model 
with fixed effects i.e. having a time invariant individual or 
group specific effects, ofi , and generated X t using the 

generating equation X it = pX it _i + e t , e~N(0,l), /p/< 1, 
making provision for an outlier. 

We specified p=0.8, y =0.8 and P=0.2 in the general 
models and specified two models: (1) for finite samples (i.e. 
n=l 1 and t=50) and (2) for large samples (i.e. n=5 1 and t=10). 
The start up values Y i0 and X i0 are obtained using the 
procedures by [17]. for a x = QiY it + 0 !r , P[ t ^A r (0,l) , we fix Qi 


at 0.8 as in [18]. We then used the within estimation of (2) to 

obtain the LSDV parameter estimators which are and 

r-v r 

$ DV . The experiment conducted is replicated five hundred 
(500) times. The root mean square errors (RMSE) of the 
estimated model, the root mean square errors (RMSE y ) of 

the estimated autoregressive coefficients as well as the Akaike 
Information Criterion (AIC) are employed for model 
comparisons. We employed Stata 10.0, Excel 2007 and 
Minitab statistical packages in the analysis to cushion the 
cumbersome nature of some estimators, 

V. RESULTS AND DISCUSSIONS 

The results of the simulation analysis for the various 
estimators considered is shown in the table 1 below at the two 
specifications of time and cross-sections for p = /= 0.8 and 

p=0.2. 


Table 1 . Simulation analysis of the various estimators of the dynamic panel data model in the presence of outliers 


Estimator 

1. n =ll, t=50 

2. n =51, t=10 

RMSE 

AIC 

BIAS y 

RMSE y 

RMSE 

AIC 

BIAS y 

RMSE y 

LSDVr 

0.01109 

1.308 

0.00102 

0.00121 

0.00101 

0.1601 

0.0124 

0.0106 

BB 

0.01201 

1.4529 

-0.0061 

0.00695 

0.01038 

0.198 

0.00224 

0.0057 

AB 

0.01245 

1.5616 

-0.0076 

0.0089 

0.0113 

0.1952 

0.0023 

0.0068 

AH 

0.01114 

1.351 

0.01645 

0.01764 

0.01113 

0.1937 

0.0038 

0.0098 

LSDV 

0.0133 

1.5212 

0.01147 

0.00143 

0.01374 

0.1765 

0.0206 

0.0401 


From the results of the study, the Blundell and Bond (BB) 
generalized method of moments (gmm) consistent estimator 
recorded minimum error in the autoregressive term with 
RMSEY=0.0057 in finite samples with large number of 
cross-sections (n=51) and finite period of time (t=10) or 
specification 1: this buttressed the results of [2], [7], [12] and 
[5], while the robust least squares dummy variable estimator 
(LSDV r ) showed high predictive power of the model by 
returning the least values in the RMSE=0.0010 and 
AIC=0.1601 at the same specification 2 as well as in 
specification 1. The robust least squares dummy variable 
estimator (LSDV R ) recorded the minimum error values of the 
autoregressive term with RMSEY=0. 00121 in specification 1. 
The error values are generally lower in specification 2, 
relatively, compared to those of the specification 1. The 
unsteady nature of the Arellano and Bond (AB) consistent 
estimator that led to the introduction of the BB system GMM 
estimator is seen as it recorded higher values of RMSE’ s of 
0.01245 and 0.0113 in specifications 1 and 2 respectively, 
relative to the other consistent estimators. However, the 
LSDVr recorded minimum RMSEY of 0.00101 in finite 
samples with large cross-section of n=5 1 and t=10 which is in 
agreement with the report of [7]. The bias of the least squares 
dummy variable estimator is approximated by the Blundell 
and Bond (BB) and its effects on the results of the robust least 
squares dummy variable estimator is quite glaring as the 
robust least squares dummy variable estimator (LSDVr) 
produced the minimum error in large samples of small 
number of cross sections (n=l 1) and long time period (t=50). 
It is observed that even in the presence of outliers the 


predictive power of the robust least squares dummy variable 
estimator (LSDVr) is more powerful than the other competing 
models and it is the most efficient except in the finite sample 
where the BB is the most efficient model in estimating the 
autoregressive coefficient. 
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